Story memo

Patrick McCaslin, patrick.mccaslin@thebaltimorebanner.com, June 2, 2025

Subject:

This story looks into the moderate, positive relationship between per capita income and the local public school’s performance on Maryland’s state test, the MCAP. The story will discuss the effect of The Blueprint for Maryland’s Future on ameliorating this issue.

Context:

State test scores have largely stagnated over the past few years, regardless of race and location despite the state investing millions of dollars into schools. For low-scoring districts, this indicates that schools are struggling to catch up and income could help illuminate the why. Especially now, the economy is an issue that is particularly relevant to Americans and showing its effect on educational outcomes could add to the picture. Additionally, the Blueprint has provisions for helping districts in poverty that have recently been called into question. This will be relevant as the state approaches its gubernatorial election and other local elections.

Relevance:

In early April, Maryland legislators concluded a standoff with Gov. Wes Moore surrounding the Blueprint. They gutted Moore’s bill to defund the Blueprint over the course of four years. Among other requests, it specifically called to defund the provision that increased funding for students who live in areas with concentrated poverty.

Understanding income’s effect on the performance of schools may provide important context to understanding the potential benefit of the Blueprint.

Target audience:

Primarily parents of students in public schools. Secondarily, politically-minded readers who may have a strong opinion about allocating such a large part of the budget to the Blueprint.

Potential sources:

  • Legislators who lobbied to keep the Blueprint fully funded
  • Administrator/teacher/student at a school that performed particularly well on the MCAP and are based in a high income area. What are their keys to success?
  • Administrator/teacher/student at a school that performed poorly on the MCAP and is based in a low income area.
  • Person at an outlier school (e.g. high income/low score or low income/high score)

Potential impact:

A demonstrated positive correlation between income and MCAP test score may influence voters’ opinions on the Blueprint and affect how the state allocates money in the future. It could also illuminate an important underlying factor in educational success, especially if it seems to affect certain townships more than others.

Now for the surprising observation . . .

Income has a moderate effect on test scores

Comparing the local per capita income with the percentage of the school that scores proficient reveals only a moderate correlation. Mathematically, this means income can explain ~25% of the variability in percentage scoring proficient.

Made with Flourish
Made with Flourish

Income has little effect on recovering from COVID education deficits

Comparing the local per capita income with the change in the percentage of the school that scores proficient reveals little correlation. Mathematically, this means income can explain ~25% of the variability in percentage scoring proficient.

Made with Flourish
Made with Flourish

So what now?

Originally, I was going to complete a geospatial analysis for a more accurate picture of the income vs scores relationship. However, now I’m wondering if I should focus the story on the surprisingly small effect it has on test scores?

Or, if I should completely pivot to other factors such as race or geography. Perhaps stratifying these results by county would reveal different impacts depending on the county? I noted stronger results in Montgomery County for instance.

Documentation of project

Load software libraries

#install.packages("tidyverse")
#install.packages("janitor")
#install.packages("tidycensus")
#install.packages("stringr")


library(tidycensus)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.4     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(stringr)

Function to clean, and sort the test score data at school level

data_cleaner <- function(df) {
  
#If *, delete from df.
  df <- df |>
  filter(Proficient.Pct != "*")
  
#Convert strings stored in Proficient.Pct to doubles. When <5, convert to 5. When >95 convert to 95. Pasted and modified from ChatGPT.

  df <- df |> 
  mutate(
    Proficient.Pct = case_when( #new Proficient.Pct
      Proficient.Pct == "<= 5.0" ~ 5L,
      Proficient.Pct == ">= 95.0" ~ 95L,
      suppressWarnings(!is.na(as.numeric(Proficient.Pct))) ~ as.numeric(Proficient.Pct),
      TRUE ~ 0
    )
  )

  df <- df |>
  filter(str_ends(str_trim(Assessment), fixed("All Grades"))) |> 
  arrange(desc(Proficient.Pct))
    
  return(df)
}

Load other test score data

 ela_math_scores_2019 <- read.csv("../MCAP Scores/MCAP_ELA_MATH_2019.csv")

ela_scores_2021 <- read.csv("../MCAP Scores/2021_MCAP_ELA_Scores.csv")

math_scores_2021 <- read.csv("../MCAP Scores/2021_MCAP_MATH_Scores.csv")

ela_scores_2023 <- read.csv("../MCAP Scores/2023_MCAP_ELA_Scores.csv")

math_scores_2023 <- read.csv("../MCAP Scores/2023_MCAP_MATH_Scores.csv")

# ela_scores_2024 <- read.csv("../MCAP Scores/2024_MCAP_ELA_Scores.csv")
# 
# math_scores_2024 <- read.csv("MCAP Scores/2024_MCAP_MATH_Scores.csv")

Now cleaning and sorting said data

math_scores_2023 <- data_cleaner(math_scores_2023)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Proficient.Pct = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion
ela_scores_2023 <- data_cleaner(ela_scores_2023)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Proficient.Pct = case_when(...)`.
## Caused by warning:
## ! NAs introduced by coercion

Find the relationship between school and per capita income

#Gets per capita income data by zip code for the entire U.S. based off of 2023 ACS
percap_national_income2023 <- get_acs(geography = "zcta", 
              variables = "B19301_001", 
              year = 2023) |> 
              rename(Income_per_cap = estimate)
## Getting data from the 2019-2023 5-year ACS
#Read in df which relates all schools to zip codes.
schools_zips <- read.csv("../schools_zips.csv")

#Filter to only contain zip codes that are found on the list of zip codes, instead of the entire U.S.

zip_incomes <- percap_national_income2023 |> 
  filter(GEOID %in% schools_zips$ZIP)
  
#Now join one to many. We have a df () which contains school names and zip codes. We have another df () which contains zip codes and income.

#First change column name from GEOID to ZIP to allow for join.

zip_incomes <- zip_incomes |> rename(ZIP = GEOID)

#And join! Now we have schools and income connected

schools_income <- left_join(schools_zips, zip_incomes, by = "ZIP")

Normalize the school name vector function

normalize <- function(name) {
  
  name |> 
    
  # Move parentheses content to the front, e.g., "Whitman (Walt) High" -> "Walt Whitman High" Taken from ChatGPT.
  str_replace("^(.+?)\\s*\\((.+?)\\)(.*)$", "\\2 \\1\\3") |>
    
# Remove anything starting with ' #' and after (e.g., "# 144")
  str_replace("\\s+#.*$", "") |> 
  
  str_squish()

}

Now find the relationship between zip codes and schools

schools_income$School.Name <- sapply(schools_income$SCHOOL_NAME, normalize)

#Join income with test scores
income_ela_scores <- ela_scores_2023 |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(ela_scores_2023, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 14 of `x` matches multiple rows in `y`.
## ℹ Row 513 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
income_math_scores <- math_scores_2023 |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(math_scores_2023, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 38 of `x` matches multiple rows in `y`.
## ℹ Row 193 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

How is the correlation?

no_outliers <- income_ela_scores |> 
filter(income_ela_scores$Income_per_cap < 80000)

#comes out to 0.477671

income_ela_scores <- income_ela_scores |>
  filter(!is.na(income_ela_scores$Income_per_cap))
cor(income_ela_scores$Income_per_cap, income_ela_scores$Proficient.Pct)
## [1] 0.5299482
#comes out to 0.529948

no_outliers_math <- income_math_scores |> 
filter(income_math_scores$Income_per_cap < 80000)

cor(no_outliers_math$Income_per_cap, no_outliers_math$Proficient.Pct)
## [1] 0.3920974
#comes out to 0.392

income_math_scores <- income_math_scores |>
  filter(!is.na(income_math_scores$Income_per_cap))
cor(income_math_scores$Income_per_cap, income_math_scores$Proficient.Pct)
## [1] 0.4711466
#comes out to 0.4711

cor(no_outliers$Income_per_cap, no_outliers$Proficient.Pct)
## [1] 0.477671
difference_calculator <- function(df,df2) {
  df <- clean_names(df)

df <- df |> 
  mutate(
    proficient_pct = suppressWarnings(as.numeric(proficient_pct)),
    tested_count = suppressWarnings(as.numeric(tested_count))
  ) |> 
  filter(!is.na(proficient_pct), !is.na(tested_count))

df <- df %>%
  mutate(assessment = case_when(
    str_detect(assessment, "Mathematics|Algebra|Geometry") ~ "Math",
    str_detect(assessment, "English") ~ "English",
    TRUE ~ "Other"
  )) |> 
  group_by(school_name, assessment) |> 
  summarise(
    Weighted_Proficient = sum(proficient_pct * tested_count, na.rm = TRUE) / sum(tested_count, na.rm = TRUE),
    .groups = "drop"
  )

#rename for join later

df <- df |> 
  rename(School.Name = school_name)

#join
df_income_scores <- df |> 
  left_join(df2, by = "School.Name")

#calculate difference
df_income_scores$diference <- df_income_scores$Proficient.Pct - df_income_scores$Weighted_Proficient

#filter NAs
df_income_scores <- df_income_scores |> 
  filter(!is.na(df_income_scores$diference))

return(df_income_scores)
}

a <- difference_calculator(ela_scores_2021,ela_scores_2023)
b <- difference_calculator(math_scores_2021,math_scores_2023)

a <- a |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(a, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 66 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
b <- b |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(b, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 61 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
ela_math_scores_2019 <- ela_math_scores_2019 %>%
  mutate(assessment = case_when(
    str_detect(Assessment, "Mathematics|Algebra|Geometry") ~ "Math",
    str_detect(Assessment, "English") ~ "English",
    TRUE ~ "Other"
  ))

ela_math_scores_2019 <- clean_names(ela_math_scores_2019)

ela_math_scores_2019 <- ela_math_scores_2019 %>%
  mutate(
    proficient_pct = suppressWarnings(as.numeric(proficient_pct)),
    tested_count = suppressWarnings(as.numeric(tested_count))
  ) %>%
  filter(!is.na(proficient_pct), !is.na(tested_count))

test <- ela_math_scores_2019 %>%
  mutate(assessment = case_when(
    str_detect(assessment, "Mathematics|Algebra|Geometry") ~ "Math",
    str_detect(assessment, "English") ~ "English",
    TRUE ~ "Other"
  )) |> 
  group_by(school_name, assessment) |> 
  summarise(
    Weighted_Proficient = sum(proficient_pct * tested_count, na.rm = TRUE) / sum(tested_count, na.rm = TRUE),
    .groups = "drop"
  )

#separate by math/english
test_ela <- test |> 
  filter(str_detect(assessment, "English"))

test_math <- test |> 
  filter(str_detect(assessment, "Math"))

#rename for join later

test_ela <- test_ela |> 
  rename(School.Name = school_name)

test_math <- test_math |> 
  rename(School.Name = school_name)

#join 2019 and 2023
income_ela_scores <- test_ela |> 
  left_join(ela_scores_2023, by = "School.Name")

income_math_scores <- test_math |> 
  left_join(math_scores_2023, by = "School.Name")

#calculate difference
income_ela_scores$diference <- income_ela_scores$Proficient.Pct - income_ela_scores$Weighted_Proficient

income_math_scores$diference <- income_math_scores$Proficient.Pct - income_math_scores$Weighted_Proficient

#filter NAs
income_ela_scores <- income_ela_scores |> 
  filter(!is.na(income_ela_scores$diference))

income_math_scores <- income_math_scores |> 
  filter(!is.na(income_math_scores$diference))

ela_difference <- income_ela_scores |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(income_ela_scores, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 62 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.
math_difference <- income_math_scores |> 
  inner_join(schools_income, by = "School.Name")
## Warning in inner_join(income_math_scores, schools_income, by = "School.Name"): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 58 of `x` matches multiple rows in `y`.
## ℹ Row 735 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

Side quest to get zip codes/income

#Gets per capita income data by zip code for the entire U.S. based off of 2023 ACS
# percap_national_income2023 <- get_acs(geography = "zcta", 
#               variables = "B19301_001", 
#               year = 2023) |> 
#               rename(Income_per_cap = estimate)
# 
# #Read in df which relates all schools to zip codes.
# maryland_zips <- zips$ZCTA5N
#   
# #Filter to only contain zip codes that are found on the list of zip codes, instead of the entire U.S.
# 
# maryland_incomes <- percap_national_income2023 |> 
#   filter(GEOID %in% maryland_zips)